Background and Context

Objective

The Aarhus University Signal Processing group, in collaboration with the University of Southern Denmark, has provided the data containing images of unique plants belonging to 12 different species. As a data scientist, I am building a Convolutional Neural Network model which would classify the plant seedlings into their respective 12 categories.

The goal of the project is to create a classifier capable of determining a plant's species from an image.

List of Plant species

Using Normalised Pixel (0,1)

Visualizing images using Gaussian Blur

Both normalized pixels and Gaussian Blur produced a similar effect.

Model 1 - Without Data Augmentation

This confirms that the integer labels match the plant categories/species. We have a higher accuracy for Charlock and Sun-flowered Cranesbill, and the least for Black grass.

Visualise predictions from Model 1

Data Augmentation Method

Charlock also has the highest accuracy and Black-grass the least in Model 2,

Visualize predictions using Model 2

Transfer Learning

Charlock, Common Chickweed and Small-flowered Cranesbill have higher accuracy and Black-grass has the least accuracy score.

The models appear overfitting as train accuracy is higher than validation and test accuracy. However, CNN Model with Data Augmentation has the best test accuracy (about 85%), followed by Transfer Learning with Data Augmentation (about 76%), and the least is the CNN Model without Data Augmentation (about 73%).

Model 1 without data augmentation has a precision score of 73%, recall score of 73%, F1 score of 72%, and accuracy of 73%.

Model 2 with data augmentation has a precision score of 85%, recall score of 85%, F1 score of 84%, and accuracy of 85%.

Model 3 with transfer learning has a precision score of 76%, recall score of 76%, F1 score of 75%, and accuracy of 76%.

Conclusion

Scope of Improvement